Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 4330 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 507.5 KiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 14 |
|---|---|
| Categorical | 1 |
recency is highly correlated with num_purchases and 1 other fields | High correlation |
avg_days_bw_purchases is highly correlated with num_purchases | High correlation |
num_purchases is highly correlated with recency and 3 other fields | High correlation |
frequency is highly correlated with recency and 1 other fields | High correlation |
revenue is highly correlated with num_purchases and 2 other fields | High correlation |
avg_ticket is highly correlated with revenue and 1 other fields | High correlation |
avg_basket_size is highly correlated with revenue and 1 other fields | High correlation |
returns_revenue is highly correlated with avg_return_revenue and 2 other fields | High correlation |
avg_return_revenue is highly correlated with returns_revenue and 2 other fields | High correlation |
num_returns is highly correlated with returns_revenue and 2 other fields | High correlation |
qty_returned is highly correlated with returns_revenue and 2 other fields | High correlation |
num_purchases is highly correlated with revenue and 1 other fields | High correlation |
revenue is highly correlated with num_purchases | High correlation |
avg_ticket is highly correlated with avg_basket_size | High correlation |
avg_basket_size is highly correlated with avg_ticket | High correlation |
returns_revenue is highly correlated with avg_return_revenue and 1 other fields | High correlation |
avg_return_revenue is highly correlated with returns_revenue and 1 other fields | High correlation |
num_returns is highly correlated with num_purchases | High correlation |
qty_returned is highly correlated with returns_revenue and 1 other fields | High correlation |
num_purchases is highly correlated with revenue | High correlation |
revenue is highly correlated with num_purchases and 1 other fields | High correlation |
avg_ticket is highly correlated with revenue and 1 other fields | High correlation |
avg_basket_size is highly correlated with avg_ticket | High correlation |
returns_revenue is highly correlated with avg_return_revenue and 2 other fields | High correlation |
avg_return_revenue is highly correlated with returns_revenue and 2 other fields | High correlation |
num_returns is highly correlated with returns_revenue and 2 other fields | High correlation |
qty_returned is highly correlated with returns_revenue and 2 other fields | High correlation |
customer_id is highly correlated with country | High correlation |
country is highly correlated with customer_id and 2 other fields | High correlation |
recency is highly correlated with date_range | High correlation |
avg_days_bw_purchases is highly correlated with date_range | High correlation |
num_purchases is highly correlated with revenue and 1 other fields | High correlation |
date_range is highly correlated with recency and 1 other fields | High correlation |
revenue is highly correlated with country and 3 other fields | High correlation |
avg_ticket is highly correlated with revenue and 3 other fields | High correlation |
avg_basket_size is highly correlated with avg_ticket | High correlation |
returns_revenue is highly correlated with avg_ticket and 2 other fields | High correlation |
avg_return_revenue is highly correlated with returns_revenue and 1 other fields | High correlation |
num_returns is highly correlated with country and 2 other fields | High correlation |
qty_returned is highly correlated with avg_ticket and 2 other fields | High correlation |
frequency is highly skewed (γ1 = 58.87555) | Skewed |
revenue is highly skewed (γ1 = 21.52932119) | Skewed |
returns_revenue is highly skewed (γ1 = -51.87290832) | Skewed |
avg_return_revenue is highly skewed (γ1 = -54.11806603) | Skewed |
qty_returned is highly skewed (γ1 = -44.92623279) | Skewed |
customer_id has unique values | Unique |
avg_days_bw_purchases has 1558 (36.0%) zeros | Zeros |
returns_revenue has 2825 (65.2%) zeros | Zeros |
avg_return_revenue has 2825 (65.2%) zeros | Zeros |
num_returns has 2825 (65.2%) zeros | Zeros |
qty_returned has 2825 (65.2%) zeros | Zeros |
Reproduction
| Analysis started | 2022-02-23 18:41:16.784834 |
|---|---|
| Analysis finished | 2022-02-23 18:42:35.296982 |
| Duration | 1 minute and 18.51 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 4330 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15300.09561 |
| Minimum | 12346 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 12346 |
|---|---|
| 5-th percentile | 12615.45 |
| Q1 | 13812.25 |
| median | 15298.5 |
| Q3 | 16779.75 |
| 95-th percentile | 17984.55 |
| Maximum | 18287 |
| Range | 5941 |
| Interquartile range (IQR) | 2967.5 |
Descriptive statistics
| Standard deviation | 1721.908834 |
|---|---|
| Coefficient of variation (CV) | 0.1125423578 |
| Kurtosis | -1.196324722 |
| Mean | 15300.09561 |
| Median Absolute Deviation (MAD) | 1484 |
| Skewness | 0.002170505721 |
| Sum | 66249414 |
| Variance | 2964970.034 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 17850 | 1 | < 0.1% |
| 15507 | 1 | < 0.1% |
| 17444 | 1 | < 0.1% |
| 15921 | 1 | < 0.1% |
| 15747 | 1 | < 0.1% |
| 15840 | 1 | < 0.1% |
| 15016 | 1 | < 0.1% |
| 14808 | 1 | < 0.1% |
| 12904 | 1 | < 0.1% |
| 15825 | 1 | < 0.1% |
| Other values (4320) | 4320 |
| Value | Count | Frequency (%) |
| 12346 | 1 | |
| 12347 | 1 | |
| 12348 | 1 | |
| 12349 | 1 | |
| 12350 | 1 | |
| 12352 | 1 | |
| 12353 | 1 | |
| 12354 | 1 | |
| 12355 | 1 | |
| 12356 | 1 |
| Value | Count | Frequency (%) |
| 18287 | 1 | |
| 18283 | 1 | |
| 18282 | 1 | |
| 18281 | 1 | |
| 18280 | 1 | |
| 18278 | 1 | |
| 18277 | 1 | |
| 18276 | 1 | |
| 18274 | 1 | |
| 18273 | 1 |
| Distinct | 35 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 34.0 KiB |
| United Kingdom | |
|---|---|
| Germany | 94 |
| France | 87 |
| Spain | 28 |
| Belgium | 24 |
| Other values (30) | 180 |
Length
| Max length | 20 |
|---|---|
| Median length | 14 |
| Mean length | 13.33325635 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 8 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | United Kingdom |
|---|---|
| 2nd row | United Kingdom |
| 3rd row | France |
| 4th row | United Kingdom |
| 5th row | United Kingdom |
Common Values
| Value | Count | Frequency (%) |
| United Kingdom | 3917 | |
| Germany | 94 | 2.2% |
| France | 87 | 2.0% |
| Spain | 28 | 0.6% |
| Belgium | 24 | 0.6% |
| Switzerland | 20 | 0.5% |
| Portugal | 19 | 0.4% |
| Italy | 14 | 0.3% |
| Finland | 12 | 0.3% |
| Norway | 10 | 0.2% |
| Other values (25) | 105 | 2.4% |
Length
| Value | Count | Frequency (%) |
| united | 3919 | |
| kingdom | 3917 | |
| germany | 94 | 1.1% |
| france | 87 | 1.1% |
| spain | 28 | 0.3% |
| belgium | 24 | 0.3% |
| switzerland | 20 | 0.2% |
| portugal | 19 | 0.2% |
| italy | 14 | 0.2% |
| finland | 12 | 0.1% |
| Other values (30) | 128 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 304 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.16143187 |
| Minimum | 1 |
|---|---|
| Maximum | 374 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 18 |
| median | 51 |
| Q3 | 143 |
| 95-th percentile | 312 |
| Maximum | 374 |
| Range | 373 |
| Interquartile range (IQR) | 125 |
Descriptive statistics
| Standard deviation | 100.2158933 |
|---|---|
| Coefficient of variation (CV) | 1.075722982 |
| Kurtosis | 0.4189044719 |
| Mean | 93.16143187 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | 1.244004492 |
| Sum | 403389 |
| Variance | 10043.22526 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 103 | 2.4% |
| 4 | 94 | 2.2% |
| 5 | 94 | 2.2% |
| 3 | 89 | 2.1% |
| 9 | 79 | 1.8% |
| 11 | 77 | 1.8% |
| 18 | 74 | 1.7% |
| 8 | 72 | 1.7% |
| 10 | 70 | 1.6% |
| 16 | 64 | 1.5% |
| Other values (294) | 3514 |
| Value | Count | Frequency (%) |
| 1 | 35 | 0.8% |
| 2 | 103 | |
| 3 | 89 | |
| 4 | 94 | |
| 5 | 94 | |
| 6 | 48 | |
| 8 | 72 | |
| 9 | 79 | |
| 10 | 70 | |
| 11 | 77 |
| Value | Count | Frequency (%) |
| 374 | 17 | |
| 373 | 17 | |
| 372 | 6 | 0.1% |
| 370 | 3 | 0.1% |
| 369 | 5 | 0.1% |
| 368 | 5 | 0.1% |
| 367 | 10 | |
| 366 | 10 | |
| 365 | 6 | 0.1% |
| 363 | 6 | 0.1% |
| Distinct | 1155 |
|---|---|
| Distinct (%) | 26.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.4575875 |
| Minimum | 0 |
|---|---|
| Maximum | 366 |
| Zeros | 1558 |
| Zeros (%) | 36.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 31 |
| Q3 | 73.15 |
| 95-th percentile | 184 |
| Maximum | 366 |
| Range | 366 |
| Interquartile range (IQR) | 73.15 |
Descriptive statistics
| Standard deviation | 65.3019421 |
|---|---|
| Coefficient of variation (CV) | 1.294194696 |
| Kurtosis | 4.652342813 |
| Mean | 50.4575875 |
| Median Absolute Deviation (MAD) | 31 |
| Skewness | 1.989512758 |
| Sum | 218481.3539 |
| Variance | 4264.343642 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1558 | |
| 70 | 21 | 0.5% |
| 46 | 18 | 0.4% |
| 55 | 17 | 0.4% |
| 49 | 16 | 0.4% |
| 31 | 16 | 0.4% |
| 91 | 16 | 0.4% |
| 42 | 15 | 0.3% |
| 21 | 15 | 0.3% |
| 35 | 15 | 0.3% |
| Other values (1145) | 2623 |
| Value | Count | Frequency (%) |
| 0 | 1558 | |
| 1 | 9 | 0.2% |
| 2 | 4 | 0.1% |
| 2.861538462 | 1 | < 0.1% |
| 3 | 6 | 0.1% |
| 3.330357143 | 1 | < 0.1% |
| 3.351351351 | 1 | < 0.1% |
| 4 | 4 | 0.1% |
| 4.191011236 | 1 | < 0.1% |
| 4.275862069 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 366 | 1 | < 0.1% |
| 365 | 1 | < 0.1% |
| 364 | 1 | < 0.1% |
| 363 | 1 | < 0.1% |
| 357 | 2 | |
| 356 | 1 | < 0.1% |
| 355 | 2 | |
| 352 | 1 | < 0.1% |
| 351 | 2 | |
| 350 | 3 |
| Distinct | 56 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.248729792 |
| Minimum | 1 |
|---|---|
| Maximum | 206 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 5 |
| 95-th percentile | 13 |
| Maximum | 206 |
| Range | 205 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 7.647964746 |
|---|---|
| Coefficient of variation (CV) | 1.800059105 |
| Kurtosis | 244.8058113 |
| Mean | 4.248729792 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 11.96826848 |
| Sum | 18397 |
| Variance | 58.49136475 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1506 | |
| 2 | 827 | |
| 3 | 501 | 11.6% |
| 4 | 394 | 9.1% |
| 5 | 237 | 5.5% |
| 6 | 173 | 4.0% |
| 7 | 139 | 3.2% |
| 8 | 98 | 2.3% |
| 9 | 68 | 1.6% |
| 10 | 55 | 1.3% |
| Other values (46) | 332 | 7.7% |
| Value | Count | Frequency (%) |
| 1 | 1506 | |
| 2 | 827 | |
| 3 | 501 | 11.6% |
| 4 | 394 | 9.1% |
| 5 | 237 | 5.5% |
| 6 | 173 | 4.0% |
| 7 | 139 | 3.2% |
| 8 | 98 | 2.3% |
| 9 | 68 | 1.6% |
| 10 | 55 | 1.3% |
| Value | Count | Frequency (%) |
| 206 | 1 | |
| 199 | 1 | |
| 124 | 1 | |
| 97 | 1 | |
| 91 | 1 | |
| 90 | 1 | |
| 86 | 1 | |
| 73 | 1 | |
| 62 | 2 | |
| 60 | 1 |
| Distinct | 374 |
|---|---|
| Distinct (%) | 8.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 186.926097 |
| Minimum | 1 |
|---|---|
| Maximum | 374 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 75 |
| median | 190 |
| Q3 | 286 |
| 95-th percentile | 363 |
| Maximum | 374 |
| Range | 373 |
| Interquartile range (IQR) | 211 |
Descriptive statistics
| Standard deviation | 115.0295314 |
|---|---|
| Coefficient of variation (CV) | 0.6153743821 |
| Kurtosis | -1.33137384 |
| Mean | 186.926097 |
| Median Absolute Deviation (MAD) | 106.5 |
| Skewness | 0.01784712628 |
| Sum | 809390 |
| Variance | 13231.7931 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 366 | 30 | 0.7% |
| 364 | 28 | 0.6% |
| 25 | 26 | 0.6% |
| 350 | 26 | 0.6% |
| 65 | 26 | 0.6% |
| 267 | 25 | 0.6% |
| 357 | 25 | 0.6% |
| 31 | 25 | 0.6% |
| 365 | 24 | 0.6% |
| 54 | 24 | 0.6% |
| Other values (364) | 4071 |
| Value | Count | Frequency (%) |
| 1 | 10 | |
| 2 | 8 | |
| 3 | 10 | |
| 4 | 12 | |
| 5 | 10 | |
| 6 | 7 | |
| 7 | 7 | |
| 8 | 12 | |
| 9 | 6 | |
| 10 | 8 |
| Value | Count | Frequency (%) |
| 374 | 17 | |
| 373 | 21 | |
| 372 | 14 | |
| 371 | 8 | 0.2% |
| 370 | 11 | 0.3% |
| 369 | 11 | 0.3% |
| 368 | 15 | |
| 367 | 20 | |
| 366 | 30 | |
| 365 | 24 |
| Distinct | 1403 |
|---|---|
| Distinct (%) | 32.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.04742032792 |
| Minimum | 0.002673796791 |
|---|---|
| Maximum | 34 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 0.002673796791 |
|---|---|
| 5-th percentile | 0.003283541846 |
| Q1 | 0.01025641026 |
| median | 0.01928374656 |
| Q3 | 0.03559951209 |
| 95-th percentile | 0.1018023643 |
| Maximum | 34 |
| Range | 33.9973262 |
| Interquartile range (IQR) | 0.02534310183 |
Descriptive statistics
| Standard deviation | 0.5371009394 |
|---|---|
| Coefficient of variation (CV) | 11.32638602 |
| Kurtosis | 3695.324789 |
| Mean | 0.04742032792 |
| Median Absolute Deviation (MAD) | 0.01151511188 |
| Skewness | 58.87555 |
| Sum | 205.3300199 |
| Variance | 0.2884774192 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.03846153846 | 27 | 0.6% |
| 0.01851851852 | 27 | 0.6% |
| 0.03225806452 | 24 | 0.6% |
| 0.01538461538 | 23 | 0.5% |
| 0.01960784314 | 23 | 0.5% |
| 0.05263157895 | 22 | 0.5% |
| 0.02127659574 | 22 | 0.5% |
| 0.01639344262 | 22 | 0.5% |
| 0.01923076923 | 21 | 0.5% |
| 0.03125 | 20 | 0.5% |
| Other values (1393) | 4099 |
| Value | Count | Frequency (%) |
| 0.002673796791 | 16 | |
| 0.002680965147 | 16 | |
| 0.002688172043 | 6 | 0.1% |
| 0.002702702703 | 2 | < 0.1% |
| 0.0027100271 | 5 | 0.1% |
| 0.002717391304 | 5 | 0.1% |
| 0.00272479564 | 9 | |
| 0.002732240437 | 10 | |
| 0.002739726027 | 6 | 0.1% |
| 0.002754820937 | 6 | 0.1% |
| Value | Count | Frequency (%) |
| 34 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 2 | 6 | |
| 1.5 | 1 | < 0.1% |
| 1.333333333 | 2 | < 0.1% |
| 1 | 6 | |
| 0.6666666667 | 3 | |
| 0.5522788204 | 1 | < 0.1% |
| 0.5349462366 | 1 | < 0.1% |
| Distinct | 4234 |
|---|---|
| Distinct (%) | 97.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1915.390737 |
| Minimum | 0 |
|---|---|
| Maximum | 278778.02 |
| Zeros | 10 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 108.434 |
| Q1 | 298.53 |
| median | 652.77 |
| Q3 | 1609.3775 |
| 95-th percentile | 5648.83 |
| Maximum | 278778.02 |
| Range | 278778.02 |
| Interquartile range (IQR) | 1310.8475 |
Descriptive statistics
| Standard deviation | 8311.858379 |
|---|---|
| Coefficient of variation (CV) | 4.339510587 |
| Kurtosis | 597.0067526 |
| Mean | 1915.390737 |
| Median Absolute Deviation (MAD) | 454.405 |
| Skewness | 21.52932119 |
| Sum | 8293641.89 |
| Variance | 69086989.72 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10 | 0.2% |
| 76.32 | 4 | 0.1% |
| 113.5 | 3 | 0.1% |
| 35.4 | 3 | 0.1% |
| 15 | 3 | 0.1% |
| 440 | 3 | 0.1% |
| 363.65 | 3 | 0.1% |
| 79.2 | 3 | 0.1% |
| 590 | 2 | < 0.1% |
| 144 | 2 | < 0.1% |
| Other values (4224) | 4294 |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 1.776356839 × 10-15 | 1 | < 0.1% |
| 3.552713679 × 10-15 | 2 | < 0.1% |
| 1.065814104 × 10-14 | 1 | < 0.1% |
| 5.684341886 × 10-14 | 1 | < 0.1% |
| 2.9 | 1 | < 0.1% |
| 3.75 | 1 | < 0.1% |
| 5.9 | 1 | < 0.1% |
| 12.24 | 1 | < 0.1% |
| 12.75 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 278778.02 | 1 | |
| 259657.3 | 1 | |
| 189735.53 | 1 | |
| 133007.13 | 1 | |
| 123638.18 | 1 | |
| 114505.32 | 1 | |
| 88138.2 | 1 | |
| 65920.12 | 1 | |
| 62924.1 | 1 | |
| 59419.34 | 1 |
| Distinct | 4228 |
|---|---|
| Distinct (%) | 97.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 369.8508748 |
| Minimum | 0 |
|---|---|
| Maximum | 13206.5 |
| Zeros | 10 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 84.73558333 |
| Q1 | 173.87875 |
| median | 282.05875 |
| Q3 | 420.8925 |
| 95-th percentile | 890.817375 |
| Maximum | 13206.5 |
| Range | 13206.5 |
| Interquartile range (IQR) | 247.01375 |
Descriptive statistics
| Standard deviation | 464.7804866 |
|---|---|
| Coefficient of variation (CV) | 1.256669967 |
| Kurtosis | 202.4962757 |
| Mean | 369.8508748 |
| Median Absolute Deviation (MAD) | 118.0933333 |
| Skewness | 10.63604992 |
| Sum | 1601454.288 |
| Variance | 216020.9007 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10 | 0.2% |
| 76.32 | 4 | 0.1% |
| 79.2 | 3 | 0.1% |
| 120 | 3 | 0.1% |
| 440 | 3 | 0.1% |
| 113.5 | 3 | 0.1% |
| 35.4 | 3 | 0.1% |
| 91.8 | 2 | < 0.1% |
| 299.75 | 2 | < 0.1% |
| 145.9 | 2 | < 0.1% |
| Other values (4218) | 4295 |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 1.776356839 × 10-15 | 1 | < 0.1% |
| 3.552713679 × 10-15 | 2 | < 0.1% |
| 1.065814104 × 10-14 | 1 | < 0.1% |
| 5.684341886 × 10-14 | 1 | < 0.1% |
| 1.45 | 1 | < 0.1% |
| 3.75 | 1 | < 0.1% |
| 5.9 | 1 | < 0.1% |
| 7.5 | 1 | < 0.1% |
| 9.14 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 13206.5 | 1 | |
| 9338.38 | 1 | |
| 7178.633333 | 1 | |
| 6207.67 | 1 | |
| 6181.909 | 1 | |
| 4873.81 | 1 | |
| 4366.78 | 1 | |
| 4327.621667 | 1 | |
| 4314.72 | 1 | |
| 4151.26 | 1 |
| Distinct | 2248 |
|---|---|
| Distinct (%) | 51.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 202.2964272 |
| Minimum | -101 |
|---|---|
| Maximum | 12540 |
| Zeros | 12 |
| Zeros (%) | 0.3% |
| Negative | 2 |
| Negative (%) | < 0.1% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | -101 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 79.72321429 |
| median | 139.5 |
| Q3 | 236 |
| 95-th percentile | 524.1833333 |
| Maximum | 12540 |
| Range | 12641 |
| Interquartile range (IQR) | 156.2767857 |
Descriptive statistics
| Standard deviation | 328.4922066 |
|---|---|
| Coefficient of variation (CV) | 1.623816155 |
| Kurtosis | 546.3530981 |
| Mean | 202.2964272 |
| Median Absolute Deviation (MAD) | 71 |
| Skewness | 17.76525474 |
| Sum | 875943.5298 |
| Variance | 107907.1298 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 120 | 19 | 0.4% |
| 72 | 18 | 0.4% |
| 64 | 17 | 0.4% |
| 44 | 16 | 0.4% |
| 136 | 16 | 0.4% |
| 144 | 16 | 0.4% |
| 146 | 15 | 0.3% |
| 60 | 15 | 0.3% |
| 88 | 15 | 0.3% |
| 106 | 15 | 0.3% |
| Other values (2238) | 4168 |
| Value | Count | Frequency (%) |
| -101 | 1 | < 0.1% |
| -44 | 1 | < 0.1% |
| 0 | 12 | |
| 0.25 | 1 | < 0.1% |
| 0.6666666667 | 1 | < 0.1% |
| 1 | 2 | < 0.1% |
| 2 | 4 | 0.1% |
| 3 | 4 | 0.1% |
| 3.333333333 | 1 | < 0.1% |
| 4 | 7 |
| Value | Count | Frequency (%) |
| 12540 | 1 | |
| 7824 | 1 | |
| 4300 | 1 | |
| 4280 | 1 | |
| 3218.416667 | 1 | |
| 3028 | 1 | |
| 2924 | 1 | |
| 2880 | 1 | |
| 2708 | 1 | |
| 2663.945946 | 1 |
avg_unique_prods
Real number (ℝ≥0)
| Distinct | 1001 |
|---|---|
| Distinct (%) | 23.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 21.61451109 |
| Minimum | 1 |
|---|---|
| Maximum | 297.8823529 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2.833333333 |
| Q1 | 9.333333333 |
| median | 16.85164835 |
| Q3 | 27.75 |
| 95-th percentile | 56 |
| Maximum | 297.8823529 |
| Range | 296.8823529 |
| Interquartile range (IQR) | 18.41666667 |
Descriptive statistics
| Standard deviation | 19.46549533 |
|---|---|
| Coefficient of variation (CV) | 0.9005753239 |
| Kurtosis | 23.82936249 |
| Mean | 21.61451109 |
| Median Absolute Deviation (MAD) | 8.451648352 |
| Skewness | 3.295843514 |
| Sum | 93590.83303 |
| Variance | 378.9055084 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 101 | 2.3% |
| 13 | 98 | 2.3% |
| 10 | 88 | 2.0% |
| 11 | 82 | 1.9% |
| 9 | 81 | 1.9% |
| 14 | 74 | 1.7% |
| 7 | 73 | 1.7% |
| 6 | 72 | 1.7% |
| 8 | 72 | 1.7% |
| 5 | 71 | 1.6% |
| Other values (991) | 3518 |
| Value | Count | Frequency (%) |
| 1 | 101 | |
| 1.2 | 1 | < 0.1% |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 2 | < 0.1% |
| 1.5 | 9 | 0.2% |
| 1.545454545 | 1 | < 0.1% |
| 1.571428571 | 1 | < 0.1% |
| 1.666666667 | 4 | 0.1% |
| 1.833333333 | 1 | < 0.1% |
| 1.888888889 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 297.8823529 | 1 | |
| 259 | 1 | |
| 219 | 1 | |
| 191 | 1 | |
| 171 | 1 | |
| 155 | 1 | |
| 153 | 1 | |
| 148 | 2 | |
| 141 | 1 | |
| 135.3333333 | 1 |
returns_revenue
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 1064 |
|---|---|
| Distinct (%) | 24.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -108.6655543 |
| Minimum | -168469.6 |
|---|---|
| Maximum | 0 |
| Zeros | 2825 |
| Zeros (%) | 65.2% |
| Negative | 1505 |
| Negative (%) | 34.8% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | -168469.6 |
|---|---|
| 5-th percentile | -147.2635 |
| Q1 | -14.835 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 0 |
| Range | 168469.6 |
| Interquartile range (IQR) | 14.835 |
Descriptive statistics
| Standard deviation | 2859.293216 |
|---|---|
| Coefficient of variation (CV) | -26.31278361 |
| Kurtosis | 2900.915238 |
| Mean | -108.6655543 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -51.87290832 |
| Sum | -470521.85 |
| Variance | 8175557.693 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -12.75 | 20 | 0.5% |
| -4.95 | 19 | 0.4% |
| -15 | 17 | 0.4% |
| -9.95 | 17 | 0.4% |
| -5.9 | 12 | 0.3% |
| -25.5 | 11 | 0.3% |
| -19.8 | 10 | 0.2% |
| -4.25 | 10 | 0.2% |
| -3.75 | 9 | 0.2% |
| Other values (1054) | 1380 |
| Value | Count | Frequency (%) |
| -168469.6 | 1 | |
| -77183.6 | 1 | |
| -22998.4 | 1 | |
| -14688.24 | 1 | |
| -8511.15 | 1 | |
| -7443.59 | 1 | |
| -5228.4 | 1 | |
| -4815.26 | 1 | |
| -4814.74 | 1 | |
| -4486.24 | 1 |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -0.42 | 2 | < 0.1% |
| -0.65 | 1 | < 0.1% |
| -0.95 | 1 | < 0.1% |
| -1.25 | 4 | 0.1% |
| -1.45 | 4 | 0.1% |
| -1.64 | 1 | < 0.1% |
| -1.65 | 5 | 0.1% |
| -1.7 | 2 | < 0.1% |
| -1.79 | 1 | < 0.1% |
avg_return_revenue
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 1110 |
|---|---|
| Distinct (%) | 25.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -66.59022209 |
| Minimum | -168469.6 |
|---|---|
| Maximum | 0 |
| Zeros | 2825 |
| Zeros (%) | 65.2% |
| Negative | 1505 |
| Negative (%) | 34.8% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | -168469.6 |
|---|---|
| 5-th percentile | -30 |
| Q1 | -6.6125 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 0 |
| Range | 168469.6 |
| Interquartile range (IQR) | 6.6125 |
Descriptive statistics
| Standard deviation | 2816.97859 |
|---|---|
| Coefficient of variation (CV) | -42.30318658 |
| Kurtosis | 3081.3908 |
| Mean | -66.59022209 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -54.11806603 |
| Sum | -288335.6616 |
| Variance | 7935368.375 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -12.75 | 23 | 0.5% |
| -4.95 | 21 | 0.5% |
| -9.95 | 20 | 0.5% |
| -15 | 17 | 0.4% |
| -3.75 | 10 | 0.2% |
| -4.25 | 10 | 0.2% |
| -17 | 9 | 0.2% |
| -7.5 | 9 | 0.2% |
| -5.9 | 9 | 0.2% |
| Other values (1100) | 1377 |
| Value | Count | Frequency (%) |
| -168469.6 | 1 | |
| -77183.6 | 1 | |
| -4599.68 | 1 | |
| -1605.086667 | 1 | |
| -1591.2 | 1 | |
| -833.25 | 1 | |
| -687.82 | 1 | |
| -638.6191304 | 1 | |
| -594 | 1 | |
| -581.4 | 1 |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -0.42 | 2 | < 0.1% |
| -0.65 | 1 | < 0.1% |
| -0.82 | 1 | < 0.1% |
| -0.95 | 1 | < 0.1% |
| -1.05 | 1 | < 0.1% |
| -1.075 | 1 | < 0.1% |
| -1.116666667 | 1 | < 0.1% |
| -1.25 | 5 | 0.1% |
| -1.38 | 1 | < 0.1% |
num_returns
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 58 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.949191686 |
| Minimum | 0 |
|---|---|
| Maximum | 223 |
| Zeros | 2825 |
| Zeros (%) | 65.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 9 |
| Maximum | 223 |
| Range | 223 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 7.219247747 |
|---|---|
| Coefficient of variation (CV) | 3.703713596 |
| Kurtosis | 284.2138811 |
| Mean | 1.949191686 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 13.27905964 |
| Sum | 8440 |
| Variance | 52.11753804 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| 1 | 476 | 11.0% |
| 2 | 284 | 6.6% |
| 3 | 172 | 4.0% |
| 4 | 117 | 2.7% |
| 5 | 83 | 1.9% |
| 6 | 53 | 1.2% |
| 7 | 52 | 1.2% |
| 8 | 41 | 0.9% |
| 11 | 24 | 0.6% |
| Other values (48) | 203 | 4.7% |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| 1 | 476 | 11.0% |
| 2 | 284 | 6.6% |
| 3 | 172 | 4.0% |
| 4 | 117 | 2.7% |
| 5 | 83 | 1.9% |
| 6 | 53 | 1.2% |
| 7 | 52 | 1.2% |
| 8 | 41 | 0.9% |
| 9 | 18 | 0.4% |
| Value | Count | Frequency (%) |
| 223 | 1 | |
| 133 | 1 | |
| 112 | 1 | |
| 111 | 1 | |
| 101 | 1 | |
| 92 | 1 | |
| 90 | 1 | |
| 81 | 1 | |
| 78 | 1 | |
| 70 | 1 |
qty_returned
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 216 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -61.92748268 |
| Minimum | -80995 |
|---|---|
| Maximum | 0 |
| Zeros | 2825 |
| Zeros (%) | 65.2% |
| Negative | 1505 |
| Negative (%) | 34.8% |
| Memory size | 34.0 KiB |
Quantile statistics
| Minimum | -80995 |
|---|---|
| 5-th percentile | -59.55 |
| Q1 | -3 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 0 |
| Range | 80995 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1691.089558 |
|---|---|
| Coefficient of variation (CV) | -27.30757792 |
| Kurtosis | 2066.254091 |
| Mean | -61.92748268 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -44.92623279 |
| Sum | -268146 |
| Variance | 2859783.894 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -1 | 169 | 3.9% |
| -2 | 150 | 3.5% |
| -3 | 105 | 2.4% |
| -4 | 89 | 2.1% |
| -6 | 78 | 1.8% |
| -5 | 61 | 1.4% |
| -12 | 52 | 1.2% |
| -7 | 44 | 1.0% |
| -8 | 43 | 1.0% |
| Other values (206) | 714 | 16.5% |
| Value | Count | Frequency (%) |
| -80995 | 1 | |
| -74215 | 1 | |
| -9360 | 1 | |
| -9014 | 1 | |
| -8004 | 1 | |
| -4427 | 1 | |
| -3768 | 1 | |
| -3332 | 1 | |
| -2878 | 1 | |
| -2022 | 1 |
| Value | Count | Frequency (%) |
| 0 | 2825 | |
| -1 | 169 | 3.9% |
| -2 | 150 | 3.5% |
| -3 | 105 | 2.4% |
| -4 | 89 | 2.1% |
| -5 | 61 | 1.4% |
| -6 | 78 | 1.8% |
| -7 | 44 | 1.0% |
| -8 | 43 | 1.0% |
| -9 | 41 | 0.9% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| customer_id | country | recency | avg_days_bw_purchases | num_purchases | date_range | frequency | revenue | avg_ticket | avg_basket_size | avg_unique_prods | returns_revenue | avg_return_revenue | num_returns | qty_returned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17850 | United Kingdom | 373 | 1.000000 | 34 | 1 | 34.000000 | 5288.63 | 155.547941 | 48.371429 | 8.735294 | -102.58 | -6.838667 | 15.0 | -40.0 |
| 1 | 13047 | United Kingdom | 57 | 52.833333 | 9 | 317 | 0.028391 | 3089.10 | 343.233333 | 84.687500 | 19.000000 | -143.49 | -6.238696 | 23.0 | -35.0 |
| 2 | 12583 | France | 3 | 26.500000 | 15 | 371 | 0.040431 | 6629.34 | 441.956000 | 292.823529 | 15.466667 | -76.04 | -25.346667 | 3.0 | -50.0 |
| 3 | 13748 | United Kingdom | 96 | 92.666667 | 5 | 278 | 0.017986 | 948.25 | 189.650000 | 87.800000 | 5.600000 | 0.00 | 0.000000 | 0.0 | 0.0 |
| 4 | 15100 | United Kingdom | 334 | 20.000000 | 3 | 40 | 0.075000 | 635.10 | 211.700000 | 9.666667 | 1.000000 | -240.90 | -80.300000 | 3.0 | -22.0 |
| 5 | 15291 | United Kingdom | 26 | 26.769231 | 14 | 348 | 0.040230 | 4551.51 | 325.107857 | 109.105263 | 7.285714 | -71.79 | -11.965000 | 6.0 | -29.0 |
| 6 | 14688 | United Kingdom | 8 | 19.263158 | 21 | 366 | 0.057377 | 5107.38 | 243.208571 | 119.333333 | 15.285714 | -523.49 | -16.359063 | 32.0 | -399.0 |
| 7 | 17809 | United Kingdom | 17 | 39.666667 | 12 | 357 | 0.033613 | 5344.85 | 445.404167 | 144.000000 | 5.083333 | -67.06 | -33.530000 | 2.0 | -41.0 |
| 8 | 15311 | United Kingdom | 1 | 4.191011 | 91 | 373 | 0.243968 | 59419.34 | 652.959780 | 319.661017 | 25.901099 | -1348.56 | -12.040714 | 112.0 | -474.0 |
| 9 | 16098 | United Kingdom | 88 | 47.666667 | 7 | 286 | 0.024476 | 2005.63 | 286.518571 | 87.571429 | 9.428571 | 0.00 | 0.000000 | 0.0 | 0.0 |
Last rows
| customer_id | country | recency | avg_days_bw_purchases | num_purchases | date_range | frequency | revenue | avg_ticket | avg_basket_size | avg_unique_prods | returns_revenue | avg_return_revenue | num_returns | qty_returned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4320 | 16000 | United Kingdom | 3 | 0.0 | 3 | 3 | 1.000000 | 12393.70 | 4131.233333 | 1703.333333 | 3.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4321 | 15195 | United Kingdom | 3 | 0.0 | 1 | 3 | 0.333333 | 3861.00 | 3861.000000 | 1404.000000 | 1.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4322 | 14087 | United Kingdom | 3 | 0.0 | 1 | 3 | 0.333333 | 181.67 | 181.670000 | 125.000000 | 61.0 | -12.75 | -12.75 | 1.0 | -1.0 |
| 4323 | 14204 | United Kingdom | 3 | 0.0 | 1 | 3 | 0.333333 | 161.03 | 161.030000 | 82.000000 | 36.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4324 | 15471 | United Kingdom | 3 | 0.0 | 1 | 3 | 0.333333 | 469.48 | 469.480000 | 266.000000 | 67.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4325 | 13436 | United Kingdom | 2 | 0.0 | 1 | 2 | 0.500000 | 196.89 | 196.890000 | 76.000000 | 12.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4326 | 15520 | United Kingdom | 2 | 0.0 | 1 | 2 | 0.500000 | 343.50 | 343.500000 | 314.000000 | 18.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4327 | 13298 | United Kingdom | 2 | 0.0 | 1 | 2 | 0.500000 | 360.00 | 360.000000 | 96.000000 | 2.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4328 | 14569 | United Kingdom | 2 | 0.0 | 1 | 2 | 0.500000 | 227.39 | 227.390000 | 79.000000 | 10.0 | 0.00 | 0.00 | 0.0 | 0.0 |
| 4329 | 12713 | Germany | 1 | 0.0 | 1 | 1 | 1.000000 | 794.55 | 794.550000 | 505.000000 | 37.0 | 0.00 | 0.00 | 0.0 | 0.0 |